Understanding Technical Articles and Their Diagrams
نویسندگان
چکیده
– A comprehensive system is being developed to transform scientific research papers into a knowledge base. An intelligent Scientist's Assistant is also being developed to allow a scientist to navigate through the knowledge base. Starting from paper documents, both text and graphics are captured and analyzed. We describe the entire system and then focus on the analysis of diagrams such as data graphs. Model-based image processing discovers graphical objects such as lines, polygons and text. The model-based methods are necessary to deal with occlusion such as data points lying on data lines, or data lines which collide with graph axes. The analysis is assisted by the use of a spatial index. Then a syntactic Graphics Constraint Grammar (GCG) is used to parse the collection of objects to identify higher-level structures. Each grammar rule has three components: (1) a production specifying a parent and its constituents, (2) a set of constraints which are typically geometrical relations that the constituents must satisfy, and (3) a set of propagation rules which specify the mapping between properties of the parents and of the children. The complexity of the parsing and its constraint satisfaction problem is reduced by an early grouping of similar elements (such as data point symbols or tick marks on scale lines) into approximate equivalence classes by using Generalized Equivalence Relations (GERs). Computation of the GERs is very efficient due to the use of a data structure (GOSSAMER) similar to the one used at the image processing stage. A semantic GCG grammar is then used, starting with the syntactic objects already found, to generate a frame-based knowledge representation of the contents of the diagrams. The article ends by discussing the problems that will be solved and those that will remain when all documents become electronic. A comprehensive document understanding system is described which analyzes diagrams by parsing with Graphics Constraint Grammars. Combining this with natural language analysis results in a knowledge base which represents the meaning of the document. The design of a comprehensive document image analysis system should be driven by the ultimate use of the information in the document. If huge quantities of text are being processed for later retrieval, then handling special notation such as 2.4x10 7 or H 2 0 may not be important. Similarly, if each figure in a document is going to be accessed as a single entity, then only a raster version needs to be …
منابع مشابه
تازه های کاربرد هولتر مونیتورینگ 24 ساعته فشارخون درکودکان
Background : There is always difficulty and inaccuracy in measurement of blood pressure by conventional method in children. Technical errors, human errors, and incompliance of child during measurements are the main disadvantage. Nowadays, 24- hour ambulatory blood pressure measurements (ABPM) are introduced to apply for children. The aim of this review is to update our understanding on the ac...
متن کاملThe Role of Basic Psychological Needs Satisfaction and Autonomous Motivation In Academic Achievement: A Self-Determination Theory Perspective
The primary purpose of learning and the process of education is conceptual understanding and the flexible use of knowledge. In other words, keeping knowledge alone is not enough in the method of excellent teaching, Rather, understanding the relationships between facts or discovering and producing facts is the main result of the learning process. The primary goal in schools is to create real ple...
متن کاملCitation Behaviours of Applied Linguists in Discussion Sections of Research Articles
It is now generally accepted that academic writing is a social activity by which the authors negotiate with their audience to gain community acceptance for their findings. One of the ways to achieve such an acceptance is by establishing intertextual links to prior research using citation. Despite a vast research on the topic and suggestion of typologies for the form and function of citation in ...
متن کاملSeven years publication of “Iranian Journal of Radiation Research” with confident but cautious steps (Editor\'s Commentary)
The Iranian Journal of Radiation Research (IJRR) is now in the eighth year of publication. This journal is the mouth piece of shared idea of Dr Shahram Akhlaghpoor and me, which was established way back in 2002. At that time the main emphasis of the founder members was to make the subject of radiation research attractive and interesting especially for combating cancer and risk assessment. T...
متن کاملMetadiscourse Features in Medical Research Articles: Subdisciplinary and Paradigmatic Influences in English and Persian
Disciplinary studies on metadiscourse in academic texts have come a rather long way (since the 1980s) to afford an awareness of the ways authors strive to signal their insights into their materials as well as their audience. However, few comprehensive corpus-based studies to date have provided a starting point for shaping our understanding of subdisciplinary and paradigmatic diversities within ...
متن کاملCollaborative Learning Through Chat Discussions and Argument Diagrams in Secondary School
This study clarifies whether secondary school students develop their argumentation skills through reading and collaboration. The students first constructed an individual argument diagram on genetically modified organisms, read three articles, and improved their diagrams. Next, they engaged in a chat debate, reflected on their debate by constructing a collaborative argument diagram on it, and fi...
متن کامل